Picture for Wayne Xin Zhao

Wayne Xin Zhao

Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards

Add code
Dec 25, 2025
Viaarxiv icon

Scaling Laws for Code: Every Programming Language Matters

Add code
Dec 15, 2025
Viaarxiv icon

Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding

Add code
Nov 12, 2025
Viaarxiv icon

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Add code
Nov 10, 2025
Viaarxiv icon

MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

Add code
Oct 06, 2025
Viaarxiv icon

Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework

Add code
Sep 05, 2025
Viaarxiv icon

STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning

Add code
Aug 26, 2025
Viaarxiv icon

From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR

Add code
Aug 11, 2025
Viaarxiv icon

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

Add code
Aug 07, 2025
Figure 1 for BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
Figure 2 for BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
Figure 3 for BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
Figure 4 for BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
Viaarxiv icon

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

Add code
Jul 23, 2025
Viaarxiv icon